首页> 外文OA文献 >PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations
【2h】

PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

机译:与PCA相关的SNP用于全球人群的结构鉴定

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population substructure, stratification assessment in multi-stage whole-genome association studies, and the study of demographic history in human populations.
机译:现有的确定用于识别人类种群结构的小标记集的方法需要对先辈有先验知识。基于主成分分析(PCA)和理论计算机科学的最新成果,我们提出了一种新颖的算法,该算法应用于全基因组数据,选择SNP的小子集(与PCA相关的SNP)来重现PCA完整发现的结构数据集,而不使用祖先信息。在先前描述的数据集(10,805个SNP,11个种群)上评估我们的方法,我们证明了使用简单的聚类算法,可以有效地利用非常少的PCA相关SNP集将个体分配到特定的大陆或种群。我们在HapMap人群中验证了我们的方法,并通过14个PCA相关SNP实现了完美的洲际分化。使用HapMap评估的170万个SNP后,使用少于100个PCA相关的SNP可以轻松地区分中国人和日本人。我们表明,一般而言,结构信息丰富的SNP不能跨地理区域移植。但是,我们设法确定了由50个PCA相关的SNP组成的通用集合,这些SNP有效地将个人分配给了9个不同的人群之一。与以信息量度进行分析相比,我们的方法尽管不受监督,但取得了相似的结果。我们继续证明我们的算法可以有效地用于混合人群的分析,而不必追溯个体的起源。分析波多黎各人的数据集(192个个体,7,257个SNP),我们显示PCA相关的SNP可用于成功预测结构和血统比例。随后,我们在独立的波多黎各人数据集中验证了这些SNP的结构鉴定。我们介绍的算法可以在几秒钟内运行,并且可以轻松地应用于大型的全基因组数据集,从而促进了群体亚结构的识别,多阶段全基因组关联研究中的分层评估以及人类人口统计历史的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号